Crate print_positions

source ·
Expand description

The print_positions and print_position_data functions provide iterators which return “print positions”.

A print position is a generalization of the UAX#29 extended grapheme cluster to include rendering color and emphasis of the user-visible character using ANSI escape codes. So a “print position” is an even longer multi-byte sequence that still represents a single user visible character on the screen.

Example:

use print_positions::{print_positions, print_position_data};

// content is e with dieresis, displayed in green with a color reset at the end.  
// Looks like 1 character on the screen.  See example "padding" to print one out.
let content = &["\u{1b}[30;42m", "\u{0065}", "\u{0308}", "\u{1b}[0m"].join("");

// access number of print positions without examining the content
assert_eq!(print_positions(content).count(), 1);
 
let segmented:Vec<_> = print_position_data(content).collect();
assert_eq!(content.len(), 15);          // content is 15 chars long
assert_eq!(segmented.len(), 1);   // but only 1 print position
 

Rationale:

In the good old days, a “character” was a simple entity. It would always fit into one octet (or perhaps only a sestet). You could access the i’th character in a string by accessing the i’th element of its array.
And you could process characters in any human language you wanted, as long as it was (transliterated into) English.

Modern applications must support multiple natural languages and some are rendered on an ANSI-compatible screen (or, less often, print device). So it’s a given that what a user would consider a simple “character”, visible as a single glyph on the screen, is represented in memory by multiple and variable numbers of bytes.

This crate provides a tool to make it once again easy to access the i’th “character” of a word on the screen by indexing to the i’th element of an array, but the array now consists of “print positions” rather than bytes or primitive type chars. See iterator PrintPositionData.

Sometimes you don’t even need to access the character data itself, you just want to know how many visible columns it will consume on the screen, in order to align it with other text or within a fixed area on the screen. See iterator PrintPositions.

Structs

  • This iterator returns “print position” data found in a string, as an immutable slice within the source string.
  • This iterator identifies print positions in the source string and returns start and end offsets of the data rather than the data itself. See PrintPositionData if you want to iterate through the data instead.

Functions